{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "5oBVJ2ed43jw"
   },
   "source": [
    "##### Copyright 2020 The TensorFlow Authors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "cellView": "form",
    "colab": {},
    "colab_type": "code",
    "id": "c4WWFlX8Qrt4"
   },
   "outputs": [],
   "source": [
    "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "# https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "EBOoEdK9xfGx"
   },
   "source": [
    "# Masking and padding with Keras"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "tsVcQxgiFeax"
   },
   "source": [
    "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://www.tensorflow.org/guide/keras/masking_and_padding\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/keras-team/keras-io/blob/master/tf/masking_and_padding.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://github.com/keras-team/keras-io/blob/master/guides/understanding_masking_and_padding.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"https://storage.googleapis.com/tensorflow_docs/keras-io/tf/masking_and_padding.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0Tese2RQ3ouf"
   },
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code",
    "id": "r5Cersl1zKJv"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "from tensorflow import keras\n",
    "from tensorflow.keras import layers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "GvYwBa0gxuFE"
   },
   "source": [
    "## Introduction\n",
    "\n",
    "**Masking** is a way to tell sequence-processing layers that certain timesteps\n",
    "in an input are missing, and thus should be skipped when processing the data.\n",
    "\n",
    "**Padding** is a special form of masking were the masked steps are at the start or at\n",
    "the beginning of a sequence. Padding comes from the need to encode sequence data into\n",
    "contiguous batches: in order to make all sequences in a batch fit a given standard\n",
    "length, it is necessary to pad or truncate some sequences.\n",
    "\n",
    "Let's take a close look."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "v4OQF72TIQ5Z"
   },
   "source": [
    "## Padding sequence data\n",
    "\n",
    "When processing sequence data, it is very common for individual samples to have\n",
    "different lengths. Consider the following example (text tokenized as words):\n",
    "\n",
    "```\n",
    "[\n",
    "  [\"Hello\", \"world\", \"!\"],\n",
    "  [\"How\", \"are\", \"you\", \"doing\", \"today\"],\n",
    "  [\"The\", \"weather\", \"will\", \"be\", \"nice\", \"tomorrow\"],\n",
    "]\n",
    "```\n",
    "\n",
    "After vocabulary lookup, the data might be vectorized as integers, e.g.:\n",
    "\n",
    "```\n",
    "[\n",
    "  [71, 1331, 4231]\n",
    "  [73, 8, 3215, 55, 927],\n",
    "  [83, 91, 1, 645, 1253, 927],\n",
    "]\n",
    "```\n",
    "\n",
    "The data is a nested list where individual samples have length 3, 5, and 6,\n",
    "respectively. Since the input data for a deep learning model must be a single tensor\n",
    "(of shape e.g. `(batch_size, 6, vocab_size)` in this case), samples that are shorter\n",
    "than the longest item need to be padded with some placeholder value (alternatively,\n",
    "one might also truncate long samples before padding short samples).\n",
    "\n",
    "Keras provides a utility function to truncate and pad Python lists to a common length:\n",
    "`tf.keras.preprocessing.sequence.pad_sequences`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code",
    "id": "kRSRhyWOTjEe"
   },
   "outputs": [],
   "source": [
    "raw_inputs = [\n",
    "    [711, 632, 71],\n",
    "    [73, 8, 3215, 55, 927],\n",
    "    [83, 91, 1, 645, 1253, 927],\n",
    "]\n",
    "\n",
    "# By default, this will pad using 0s; it is configurable via the\n",
    "# \"value\" parameter.\n",
    "# Note that you could \"pre\" padding (at the beginning) or\n",
    "# \"post\" padding (at the end).\n",
    "# We recommend using \"post\" padding when working with RNN layers\n",
    "# (in order to be able to use the\n",
    "# CuDNN implementation of the layers).\n",
    "padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(\n",
    "    raw_inputs, padding=\"post\"\n",
    ")\n",
    "print(padded_inputs)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "wzpJh73Dt4Q7"
   },
   "source": [
    "## Masking\n",
    "\n",
    "Now that all samples have a uniform length, the model must be informed that some part\n",
    "of the data is actually padding and should be ignored. That mechanism is **masking**.\n",
    "\n",
    "There are three ways to introduce input masks in Keras models:\n",
    "\n",
    "- Add a `keras.layers.Masking` layer.\n",
    "- Configure a `keras.layers.Embedding` layer with `mask_zero=True`.\n",
    "- Pass a `mask` argument manually when calling layers that support this argument (e.g.\n",
    "RNN layers)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "gy7L4s70ZqpQ"
   },
   "source": [
    "## Mask-generating layers: `Embedding` and `Masking`\n",
    "\n",
    "Under the hood, these layers will create a mask tensor (2D tensor with shape `(batch,\n",
    "sequence_length)`), and attach it to the tensor output returned by the `Masking` or\n",
    "`Embedding` layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code",
    "id": "V0o8DBr9Q95D"
   },
   "outputs": [],
   "source": [
    "embedding = layers.Embedding(input_dim=5000, output_dim=16, mask_zero=True)\n",
    "masked_output = embedding(padded_inputs)\n",
    "\n",
    "print(masked_output._keras_mask)\n",
    "\n",
    "masking_layer = layers.Masking()\n",
    "# Simulate the embedding lookup by expanding the 2D input to 3D,\n",
    "# with embedding dimension of 10.\n",
    "unmasked_embedding = tf.cast(\n",
    "    tf.tile(tf.expand_dims(padded_inputs, axis=-1), [1, 1, 10]), tf.float32\n",
    ")\n",
    "\n",
    "masked_embedding = masking_layer(unmasked_embedding)\n",
    "print(masked_embedding._keras_mask)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "lBPHoQeUnNBk"
   },
   "source": [
    "As you can see from the printed result, the mask is a 2D boolean tensor with shape\n",
    "`(batch_size, sequence_length)`, where each individual `False` entry indicates that\n",
    "the corresponding timestep should be ignored during processing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "mPhq3RfDAtPB"
   },
   "source": [
    "## Mask propagation in the Functional API and Sequential API\n",
    "\n",
    "When using the Functional API or the Sequential API, a mask generated by an `Embedding`\n",
    "or `Masking` layer will be propagated through the network for any layer that is\n",
    "capable of using them (for example, RNN layers). Keras will automatically fetch the\n",
    "mask corresponding to an input and pass it to any layer that knows how to use it.\n",
    "\n",
    "For instance, in the following Sequential model, the `LSTM` layer will automatically\n",
    "receive a mask, which means it will ignore padded values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code",
    "id": "4dLZUGISFVAW"
   },
   "outputs": [],
   "source": [
    "model = keras.Sequential(\n",
    "    [layers.Embedding(input_dim=5000, output_dim=16, mask_zero=True), layers.LSTM(32),]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "cJ0sneMbeyDy"
   },
   "source": [
    "This is also the case for the following Functional API model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code",
    "id": "sRIRwwVv8R8D"
   },
   "outputs": [],
   "source": [
    "inputs = keras.Input(shape=(None,), dtype=\"int32\")\n",
    "x = layers.Embedding(input_dim=5000, output_dim=16, mask_zero=True)(inputs)\n",
    "outputs = layers.LSTM(32)(x)\n",
    "\n",
    "model = keras.Model(inputs, outputs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "t439AapvLGbE"
   },
   "source": [
    "## Passing mask tensors directly to layers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UB8jymZONw2m"
   },
   "source": [
    "Layers that can handle masks (such as the `LSTM` layer) have a `mask` argument in their\n",
    "`__call__` method.\n",
    "\n",
    "Meanwhile, layers that produce a mask (e.g. `Embedding`) expose a `compute_mask(input,\n",
    "previous_mask)` method which you can call.\n",
    "\n",
    "Thus, you can pass the output of the `compute_mask()` method of a mask-producing layer\n",
    "to the `__call__` method of a mask-consuming layer, like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code",
    "id": "R5pIXpXd8BFp"
   },
   "outputs": [],
   "source": [
    "class MyLayer(layers.Layer):\n",
    "    def __init__(self, **kwargs):\n",
    "        super(MyLayer, self).__init__(**kwargs)\n",
    "        self.embedding = layers.Embedding(input_dim=5000, output_dim=16, mask_zero=True)\n",
    "        self.lstm = layers.LSTM(32)\n",
    "\n",
    "    def call(self, inputs):\n",
    "        x = self.embedding(inputs)\n",
    "        # Note that you could also prepare a `mask` tensor manually.\n",
    "        # It only needs to be a boolean tensor\n",
    "        # with the right shape, i.e. (batch_size, timesteps).\n",
    "        mask = self.embedding.compute_mask(inputs)\n",
    "        output = self.lstm(x, mask=mask)  # The layer will ignore the masked values\n",
    "        return output\n",
    "\n",
    "\n",
    "layer = MyLayer()\n",
    "x = np.random.random((32, 10)) * 100\n",
    "x = x.astype(\"int32\")\n",
    "layer(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "QQgXq29dWTxq"
   },
   "source": [
    "## Supporting masking in your custom layers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "7rPlRfrYVkdQ"
   },
   "source": [
    "Sometimes, you may need to write layers that generate a mask (like `Embedding`), or\n",
    "layers that need to modify the current mask.\n",
    "\n",
    "For instance, any layer that produces a tensor with a different time dimension than its\n",
    "input, such as a `Concatenate` layer that concatenates on the time dimension, will\n",
    "need to modify the current mask so that downstream layers will be able to properly\n",
    "take masked timesteps into account.\n",
    "\n",
    "To do this, your layer should implement the `layer.compute_mask()` method, which\n",
    "produces a new mask given the input and the current mask.\n",
    "\n",
    "Here is an example of a `TemporalSplit` layer that needs to modify the current mask."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code",
    "id": "qAPVJw9QwI2v"
   },
   "outputs": [],
   "source": [
    "class TemporalSplit(keras.layers.Layer):\n",
    "    \"\"\"Split the input tensor into 2 tensors along the time dimension.\"\"\"\n",
    "\n",
    "    def call(self, inputs):\n",
    "        # Expect the input to be 3D and mask to be 2D, split the input tensor into 2\n",
    "        # subtensors along the time axis (axis 1).\n",
    "        return tf.split(inputs, 2, axis=1)\n",
    "\n",
    "    def compute_mask(self, inputs, mask=None):\n",
    "        # Also split the mask into 2 if it presents.\n",
    "        if mask is None:\n",
    "            return None\n",
    "        return tf.split(mask, 2, axis=1)\n",
    "\n",
    "\n",
    "first_half, second_half = TemporalSplit()(masked_embedding)\n",
    "print(first_half._keras_mask)\n",
    "print(second_half._keras_mask)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "7KXNQdjKXQzf"
   },
   "source": [
    "Here is another example of a `CustomEmbedding` layer that is capable of generating a\n",
    "mask from input values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code",
    "id": "EwOKTtEaPRWB"
   },
   "outputs": [],
   "source": [
    "class CustomEmbedding(keras.layers.Layer):\n",
    "    def __init__(self, input_dim, output_dim, mask_zero=False, **kwargs):\n",
    "        super(CustomEmbedding, self).__init__(**kwargs)\n",
    "        self.input_dim = input_dim\n",
    "        self.output_dim = output_dim\n",
    "        self.mask_zero = mask_zero\n",
    "\n",
    "    def build(self, input_shape):\n",
    "        self.embeddings = self.add_weight(\n",
    "            shape=(self.input_dim, self.output_dim),\n",
    "            initializer=\"random_normal\",\n",
    "            dtype=\"float32\",\n",
    "        )\n",
    "\n",
    "    def call(self, inputs):\n",
    "        return tf.nn.embedding_lookup(self.embeddings, inputs)\n",
    "\n",
    "    def compute_mask(self, inputs, mask=None):\n",
    "        if not self.mask_zero:\n",
    "            return None\n",
    "        return tf.not_equal(inputs, 0)\n",
    "\n",
    "\n",
    "layer = CustomEmbedding(10, 32, mask_zero=True)\n",
    "x = np.random.random((3, 10)) * 9\n",
    "x = x.astype(\"int32\")\n",
    "\n",
    "y = layer(x)\n",
    "mask = layer.compute_mask(x)\n",
    "\n",
    "print(mask)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Ete4q50hoKfp"
   },
   "source": [
    "## Opting-in to mask propagation on compatible layers\n",
    "\n",
    "Most layers don't modify the time dimension, so don't need to modify the current mask.\n",
    "However, they may still want to be able to **propagate** the current mask, unchanged,\n",
    "to the next layer. **This is an opt-in behavior.** By default, a custom layer will\n",
    "destroy the current mask (since the framework has no way to tell whether propagating\n",
    "the mask is safe to do).\n",
    "\n",
    "If you have a custom layer that does not modify the time dimension, and if you want it\n",
    "to be able to propagate the current input mask, you should set `self.supports_masking\n",
    "= True` in the layer constructor. In this case, the default behavior of\n",
    "`compute_mask()` is just pass the current mask through.\n",
    "\n",
    "Here's an example of a layer that is whitelisted for mask propagation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code",
    "id": "NebmTdtvdDvm"
   },
   "outputs": [],
   "source": [
    "class MyActivation(keras.layers.Layer):\n",
    "    def __init__(self, **kwargs):\n",
    "        super(MyActivation, self).__init__(**kwargs)\n",
    "        # Signal that the layer is safe for mask propagation\n",
    "        self.supports_masking = True\n",
    "\n",
    "    def call(self, inputs):\n",
    "        return tf.nn.relu(inputs)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "dCTi4XjMPgx3"
   },
   "source": [
    "You can now use this custom layer in-between a mask-generating layer (like `Embedding`)\n",
    "and a mask-consuming layer (like `LSTM`), and it will pass the mask along so that it\n",
    "reachs the mask-consuming layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code",
    "id": "2suiga4WROxA"
   },
   "outputs": [],
   "source": [
    "inputs = keras.Input(shape=(None,), dtype=\"int32\")\n",
    "x = layers.Embedding(input_dim=5000, output_dim=16, mask_zero=True)(inputs)\n",
    "x = MyActivation()(x)  # Will pass the mask along\n",
    "print(\"Mask found:\", x._keras_mask)\n",
    "outputs = layers.LSTM(32)(x)  # Will receive the mask\n",
    "\n",
    "model = keras.Model(inputs, outputs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "hMD6zZotMDRJ"
   },
   "source": [
    "## Writing layers that need mask information\n",
    "\n",
    "Some layers are mask *consumers*: they accept a `mask` argument in `call` and use it to\n",
    "determine whether to skip certain time steps.\n",
    "\n",
    "To write such a layer, you can simply add a `mask=None` argument in your `call`\n",
    "signature. The mask associated with the inputs will be passed to your layer whenever\n",
    "it is available.\n",
    "\n",
    "Here's a simple example below: a layer that computes a softmax over the time dimension\n",
    "(axis 1) of an input sequence, while discarding masked timesteps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code",
    "id": "nq4XJuNOZu3H"
   },
   "outputs": [],
   "source": [
    "class TemporalSoftmax(keras.layers.Layer):\n",
    "    def call(self, inputs, mask=None):\n",
    "        broadcast_float_mask = tf.expand_dims(tf.cast(mask, \"float32\"), -1)\n",
    "        inputs_exp = tf.exp(inputs) * broadcast_float_mask\n",
    "        inputs_sum = tf.reduce_sum(inputs * broadcast_float_mask, axis=1, keepdims=True)\n",
    "        return inputs_exp / inputs_sum\n",
    "\n",
    "\n",
    "inputs = keras.Input(shape=(None,), dtype=\"int32\")\n",
    "x = layers.Embedding(input_dim=10, output_dim=32, mask_zero=True)(inputs)\n",
    "x = layers.Dense(1)(x)\n",
    "outputs = TemporalSoftmax()(x)\n",
    "\n",
    "model = keras.Model(inputs, outputs)\n",
    "y = model(np.random.randint(0, 10, size=(32, 100)), np.random.random((32, 100, 1)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "IeNtOzPSssZd"
   },
   "source": [
    "## Summary\n",
    "\n",
    "That is all you need to know about padding & masking in Keras. To recap:\n",
    "\n",
    "- \"Masking\" is how layers are able to know when to skip / ignore certain timesteps in\n",
    "sequence inputs.\n",
    "- Some layers are mask-generators: `Embedding` can generate a mask from input values\n",
    "(if `mask_zero=True`), and so can the `Masking` layer.\n",
    "- Some layers are mask-consumers: they expose a `mask` argument in their `__call__`\n",
    "method. This is the case for RNN layers.\n",
    "- In the Functional API and Sequential API, mask information is propagated\n",
    "automatically.\n",
    "- When using layers in a standalone way, you can pass the `mask` arguments to layers\n",
    "manually.\n",
    "- You can easily write layers that modify the current mask, that generate a new mask,\n",
    "or that consume the mask associated with the inputs."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "masking_and_padding.ipynb",
   "private_outputs": true,
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}